上一篇,有提到這段敘述:
Prometheus Server
的Alert event
,並可整合第三方、自訂的告警模式來發送警報,例如:Slack、E-mail、與其他 Webhook 等等。下列我們將簡略提到告警配置的概念。
稍早透過HELM安裝stable/prometheus-operator
時,其指定的values.yaml
中啟用alertmanager
即可。
alertmanager:
## Deploy alertmanager
##
enabled: true
首先需在Prometheus Server
定義ScrapeConfig
(Targets
)監控對象。
凡不符合下列規則的監控目標,即會發送 alert event 到AlertManeger
服務。
job
為配置單位。label
指定、替換規則。 additionalScrapeConfigs:
- job_name: "web-service"
scrape_interval: 15s
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets: # 要檢查的網址
# 檢查監控端狀態的服務網址
- https://grafana.url.com.tw
- https://kibana.url.com.tw
- https://prometheus.url.com.tw
- https://es-client.url.com.tw
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115
再來是AlertManager
面向的設定
config:
global:
resolve_timeout: 2m # 未收到標記告警通知,等待 timeout 時間之後事件標記為 resolve。
route:
group_by: [prometheus, alertname]
group_wait: 30s
group_interval: 5m # 重複發送告警的間隔時間
repeat_interval: 12h
receiver: slack-receiver
routes:
# 範例:符合 alertname = Watchdog 條件者,不進行通知。
- receiver: 'null'
match:
alertname: Watchdog
# 範例:符合 prometheus = 指定特定環境,將告警事件透過 k8s-receiver 發送通知。
- receiver: 'k8s-receiver'
group_wait: 30s
match:
prometheus: namespace/prometheus-operator-prometheus
# 設定通知管道
receivers:
# 黑洞
- name: 'null'
# 預設 Slace channel
- name: slack-receiver
slack_configs:
- api_url: "https://hooks.slack.com/services/dlksdjfio/sljfidjo/dlksjjsioij"
channel: "#channel-name"
title: "{{ .CommonAnnotations.env }}: {{ .CommonAnnotations.summary }}"
text: "{{ range .Alerts }}{{ .Annotations.message }}\n{{ end }}"
send_resolved: true
# 設定另一組 Slack channel
- name: k8s-receiver
slack_configs:
- api_url: "https://hooks.slack.com/services/dlksdjfio/djofjido/alsjdiodjoj"
channel: "#channel-name2"
title: "K8s : {{ .CommonAnnotations.summary }}"
text: "{{ range .Alerts }}{{ .Annotations.message }}\n{{ end }}"
send_resolved: true